Training Set Selection for Building Compact and Efficient Language Models

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Compact N-gram Language Models Incrementally

In traditional n-gram language modeling, we collect the statistics for all n-grams observed in the training set up to a certain order. The model can then be pruned down to a more compact size with some loss in modeling accuracy. One of the more principled methods for pruning the model is the entropy-based pruning proposed by Stolcke (1998). In this paper, we present an algorithm for incremental...

متن کامل

Efficient Subsampling for Training Complex Language Models

We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique. The original multi-class language modeling problem is transformed into a set of binary problems where each binary classifier predicts whether or not a particular word will occur. ...

متن کامل

Data Selection for Compact Adapted SMT Models

Data selection is a common technique for adapting statistical translation models for a specific domain, which has been shown to both improve translation quality and to reduce model size. Selection relies on some in-domain data, of the same domain of the texts expected to be translated. Selecting the sentence-pairs that are most similar to the in-domain data from a pool of parallel texts has bee...

متن کامل

Training Set and Filters Selection for the Efficient Use of Multispectral Acquisition Systems

The quality of the results obtained from a multispectral acquisition system can be affected by several factors, including the training set on which the system characterization model relies and the optical filters that allow acquisition in different bands of the light spectrum. In this paper, we investigate the joint effect of training set and filter selection on the results of a typical multisp...

متن کامل

Compact Maximum Entropy Language Models

In language modeling we are always confronted with a sparse data problem. The Maximum Entropy formalism allows to fully integrate complementary statistical properties of limited corpora. The focus of the present paper is twofold. The new smoothing technique of LM-induced marginals is introduced and discussed. We then highlight the advantages resulting from a combination of robust features and s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEICE Transactions on Information and Systems

سال: 2009

ISSN: 0916-8532,1745-1361

DOI: 10.1587/transinf.e92.d.506